Description
dlt
is a package that installs pyarrow
in its parquet
extra: https://github.com/dlt-hub/dlt/blob/devel/pyproject.toml#L123 and has many other extras as well. We're seeing that pex brings in pyarrow
inconsistently if you install dlt
with multiple extras as separate dependencies, compared to if you install it with all the extras simultaneously.
To reproduce, on pex 2.33.0 on mac:
pex 'dlt[snowflake]' 'dlt[filesystem]' 'dlt[parquet]' --pip-version=latest --platform=current --resolver-version=pip-2020-resolver -o bug.pex
outputs a pex where you cannot import pyarrow. When you run the pex file verbosely, it outputs the following to explain why:
pex: Skipping activation of `pyarrow<18,>=12.0.0; (python_version >= "3.9" and python_version < "3.13") and (extra == "bigquery" or extra == "parquet" or extra == "motherduck" or extra == "athena" or extra == "synapse" or extra == "clickhouse" or extra == "dremio" or extra == "lancedb" or extra == "deltalake" or extra == "pyiceberg")` due to environment marker de-selection
However, if you install all three extras simultaneously, it creates a PEX where you correctly can import pyarrow:
pex 'dlt[snowflake,filesystem,parquet]' --pip-version=latest --platform=current --resolver-version=pip-2020-resolver -o good.pex
In both of the above cases if you use pip install instead of pex, the resulting venv allows you to import pyarrow, as expected.