You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bug 💥
I am Trying to train the model on doclaynet dataset using multiple gpu, but facing error as CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
I have changed the code in d2_frcnn_train.py as follows:
Yes, its occurring at the first iteration.
The data for the doclaynet has been downloaded from the link provided from datasets_and_eval.ipynb
Using this data, I am able to train it on single GPU and it has been successfully trained.
Bug 💥
I am Trying to train the model on doclaynet dataset using multiple gpu, but facing error as CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
I have changed the code in d2_frcnn_train.py as follows:
def main(num_gpus,path_config_yaml,dataset_train,path_weights,config_overwrite,log_dir,build_train_config,dataset_val,build_val_config,metric_name,metric,pipeline_component_name):
launch(train_d2_faster_rcnn,
num_gpus,
1,
0,
"auto",
args=(path_config_yaml,
dataset_train,
path_weights,
config_overwrite,
log_dir,
build_train_config,
dataset_val,
build_val_config,
metric_name,
metric,
pipeline_component_name),)
where i am passing these parameters from Datasets_and_Eval.ipynb

Expected behavior 🧮
Reduce the computational time by some fraction
Screenshots 🖼
If possible, please add a screenshot of the error message, if possible
Desktop (please complete the following information, if any other than the one in the install requirements):
ubuntu 20.04
cuda 11.3
torch 1.12.1 cuda enabled
Additional context 🧬
its working fine on the single gpu, but i want to train it on multiple gpus
everything has been modified accordingly to the changes like init.py across train and deepdoctection folder
The text was updated successfully, but these errors were encountered: