Description
Modin version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest released version of Modin.
-
I have confirmed this bug exists on the main branch of Modin. (In order to do this you can follow this guide.)
Reproducible Example
import ray
import modin.pandas as pd
## Option 1 UserWarning
ray.shutdown()
ray.init(address="ray://",runtime_env={"pip": ['modin']})
@ray.remote
def cause_error():
data = {'id': ["ajajajajajjajajajajajaja", "jajjajajajajajajajajja"]}
df = pd.DataFrame(data)
return df
result_df = ray.get(cause_error.remote())
print(result_df)
## Option 2 Import Error
ray.shutdown()
ray.init(address="ray://",runtime_env={"pip": ['modin'], "env_vars": {"__MODIN_AUTOIMPORT_PANDAS__": "1"}})
@ray.remote
def cause_error():
data = {'id': ["ajajajajajjajajajajajaja", "jajjajajajajajajajajja"]}
df = pd.DataFrame(data)
return df
result = ray.get(cause_error.remote())
print(result_df)
Issue Description
When installing modin via runtime_env in ray.init() on an pre-initialised ray-cluster, Modin will either throw a UserWarning for every Modin Function call to import the Modin Autoimport as a Variable. When importing the variable, it will throw an import error: the modules numpy, putz, dateutil are not found. When installing the packages separately in ray.init() this does not work either and it will throw the same error. I dont know whether the code is reproducable because i can not share the ray cluster ip adress.
Expected Behavior
Modin should install itself on the Ray Cluster without throwing any Warnings or Errors.
Error Logs
(cause_error pid=8379) UserWarning: When using a pre-initialized Ray cluster, please ensure that the runtime env sets environment variable __MODIN_AUTOIMPORT_PANDAS__ to 1
(cause_error pid=8379) UserWarning: Distributing <class 'dict'> object. This may take some time.
UserWarning: When using a pre-initialized Ray cluster, please ensure that the runtime env sets environment variable __MODIN_AUTOIMPORT_PANDAS__ to 1
id
0 ajajajajajjajajajajajaja
1 jajjajajajajajajajajja
(raylet) Error processing line 1 of /tmp/ray/session_2022-10-13_15-28-58_053610_7/runtime_resources/pip/674672cbad3f4fcd366ccd963bbe374546649594/virtualenv/lib/python3.9/site-packages/modin-autoimport-pandas.pth:
(raylet)
(raylet) Traceback (most recent call last):
(raylet) File "/home/ray/anaconda3/lib/python3.9/site.py", line 169, in addpackage
(raylet) exec(line)
(raylet) File "<string>", line 1, in <module>
(raylet) File "/tmp/ray/session_2022-10-13_15-28-58_053610_7/runtime_resources/pip/674672cbad3f4fcd366ccd963bbe374546649594/virtualenv/lib/python3.9/site-packages/pandas/__init__.py", line 16, in <module>
(raylet) raise ImportError(
(raylet) ImportError: Unable to import required dependencies:
(raylet) numpy: No module named 'numpy'
(raylet) pytz: No module named 'pytz'
(raylet) dateutil: No module named 'dateutil'
(raylet)
(raylet) Remainder of file ignored
(raylet) Error processing line 1 of /tmp/ray/session_2022-10-13_15-28-58_053610_7/runtime_resources/pip/674672cbad3f4fcd366ccd963bbe374546649594/virtualenv/lib/python3.9/site-packages/modin-autoimport-pandas.pth:
(raylet)
(raylet) Traceback (most recent call last):
(raylet) File "/home/ray/anaconda3/lib/python3.9/site.py", line 169, in addpackage
(raylet) exec(line)
(raylet) File "<string>", line 1, in <module>
(raylet) File "/tmp/ray/session_2022-10-13_15-28-58_053610_7/runtime_resources/pip/674672cbad3f4fcd366ccd963bbe374546649594/virtualenv/lib/python3.9/site-packages/pandas/__init__.py", line 16, in <module>
(raylet) raise ImportError(
(raylet) ImportError: Unable to import required dependencies:
(raylet) numpy: No module named 'numpy'
(raylet) pytz: No module named 'pytz'
(raylet) dateutil: No module named 'dateutil'
(raylet)
(raylet) Remainder of file ignored
(cause_error pid=8481) UserWarning: Distributing <class 'dict'> object. This may take some time.
id
0 ajajajajajjajajajajajaja
1 jajjajajajajajajajajja
Installed Versions
INSTALLED VERSIONS
commit : 621bc10
python : 3.9.10.final.0
python-bits : 64
OS : Linux
OS-release : 5.14.0-70.26.1.el9_0.x86_64
Version : #1 SMP PREEMPT Fri Sep 2 16:07:40 EDT 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
Modin dependencies
modin : 0.16.0
ray : 2.0.0
dask : 2022.01.1
distributed : 2022.01.1
hdk : None
pandas dependencies
pandas : 1.5.0
numpy : 1.23.2
pytz : 2022.2.1
dateutil : 2.8.2
setuptools : 53.0.0
pip : 21.2.3
Cython : 0.29.32
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.5
html5lib : None
pymysql : 1.0.2
psycopg2 : 2.9.3
jinja2 : 3.1.2
IPython : 8.4.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
brotli : None
fastparquet : None
fsspec : 2022.7.1
gcsfs : None
matplotlib : 3.5.3
numba : None
numexpr : None
odfpy : None
openpyxl : 3.0.10
pandas_gbq : None
pyarrow : 6.0.1
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.9.0
snappy : None
sqlalchemy : 1.4.40
tables : None
tabulate : 0.8.10
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None